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The predictive capability of a modification of Rissanen's accu- 
mulated prediction error (APE) criterion, APEa^ , is investigated in 
infinite-order autoregressive (AR(oo)) models. Instead of accumulat- 
ing squares of sequential prediction errors from the beginning, APE^^ 
is obtained by summing these squared errors from stage nSn , where n 
is the sample size and 1/n < Sn < 1 — (l/n) may depend on n. Under 
certain regularity conditions, an asymptotic expression is derived for 
the mean-squared prediction error (MSPE) of an AR predictor with 
order determined by APE^^ . This expression shows that the predic- 
tion performance of APEi„ can vary dramatically depending on the 
choice of S„. Another interesting finding is that when S„ approaches 
1 at a certain rate, APE^^ can achieve asymptotic efficiency in most 
practical situations. An asymptotic equivalence between APE^^ and 
an information criterion with a suitable penalty term is also estab- 
lished from the MSPE point of view. This offers new perspectives for 
understanding the information and prediction-based model selection 
criteria. Finally, we provide the first asymptotic efficiency result for 
the case when the underlying AR(cio) model is allowed to degenerate 
to a finite autoregression. 

1. Introduction. In the past two decades, investigations on the accumu- 
lated prediction error (APE) [21] and its variations have attracted consid- 
erable attention among researchers from various disciplines. Prior to the 
early 1990s, a large number of studies focused on its consistency in selecting 
regression or time series models (e.g., [6, 8, 26, 27, 29]). However, since prov- 
ing consistency requires assuming that the true model is included among the 
family of candidate models (which is rather difficult to justify in practice), 
recent research has focused more on understanding its statistical properties 
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under possible model misspecification (e.g., [3, 15, 17, 20, 29, 30], among 
others). While a much deeper understanding of APE in cases of a mis- 
specified model has been gained from these recent efforts, APE's prediction 
performance after model selection still remains unclear. This motivated the 
present study. 

To select a model for the realization of a stationary time series, it is 
common to assume that the realization comes from an autoregressive moving 
average (ARMA) process whose AR and MA orders are known to lie within 
prescribed finite intervals. Then a model selection procedure is used to select 
orders within these intervals and thereby determine a model for the data. 
However, as pointed out by Shibata [25], Goldenshluger and Zeevi [5] and 
Ing and Wei [14], this assumption can rarely be justified in practice, and the 
less stringent assumption is that the time series data are observations from 
a linear stationary process. Following this idea, it is assumed in the sequel 
that observations generated by an AR(cx)) process {xt}, where 

oo 

(1.1) xt + ^aiXt-i = et, t = 0,±1,±2,..., 

i=l 

with the characteristic polynomial A{z) = 1 + J2i^i ^i^^ 7^ for all [z] < 1 
and {et} being a sequence of independent random noise variables satisfying 
E{et) = and E{e^) = <t^ for all t. To predict future observations, we con- 
sider a family of approximation models {AR(1), . . . , AR(ir„)}, where the 
maximal order Kn is allowed to tend to oo as n does in order to reduce 
approximation errors. In this framework, the APE value of model AR(A:), 
1 < A; < Kn , is given by 

n-1 

(1.2) APE{k) = J2i^^+l-^^+lik)f, 

i=m 

where Xi+i{k) = -:>d^{k)ki{k) , Xi{k) = (xj, . . . , Xj.fc+i)', aii{k) satisfies 

1 

(1.3) - Ri{k)ki{k) = — V Xj(A:)xj+i, 

with 

1 

(1.4) R^^k) = —- x,(A;)x;.(A;), 

and m > Kn + 1 is the first integer j such that aij{Kn) is uniquely defined. 
As observed, APE(A;) measures the performance of AR(/i;) when it is used 
for sequential predictions. Recently, a modification of APE, 

n-1 

(1.5) APE^JA:) = {xi+i - x,+i{k)f , 

i=n5„ 
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with 1/n < 6n < 1 — i^/n) depending on n, has also been considered by 
several authors, for example, West [30], McCracken [20] and Inoue and Kilian 
[15]. Since APE^^ includes the original APE as a special case, this paper 
focuses on APE^^. As will be shown later, the performance of APE^^ can 
vary dramatically depending on the choice of 5n- 

In view of (1.5), it is natural to predict the next observation Xn+i using 
Xn+i{kn,5j, where 

(1.6) =argminAPE5„(A;). 

l<k<K„ 

This type of prediction, targeting future values of the observed time series, is 
referred to as a same-realization prediction. On the other hand, if the process 
used in estimation (or model selection) and that for prediction are indepen- 
dent, then it is called an independent-realization prediction (see [2, 16, 22] 
and [25]). For differences between these two types of predictions in various 
time series models, see [10, 11, 13, 14, 18]. The prediction performance of 
APE^^ after order selection is assessed using the mean-squared prediction 
error (MSPE) qniK^sJ, where, with A:„ = kn{xi, . . . , a;„) S {1, 2, . . .,Kn}, 

(1-7) qn{kn) = E{Xn+l - Xn+l{kn)) ■ 

There are three interrelated issues addressed in this paper. The first one 
focuses on the asymptotic expression for qn{kn.Sn)- To deal with this prob- 
lem, in Proposition 2 (see Section 2) we establish a general theory that pro- 
vides sufficient conditions under which qn{kn) — cr'^ can be asymptotically 
expressed as a sum of two terms that measure the model complexity and 
the goodness of fit. This result is then applied to the case kn = kn^s„ with 6n 
bounded away from 1; see Theorem 1 in Section 3. A series of examples is 
given after Theorem 1 to illustrate its implications. In particular, it is shown 
in Example 1 that when the AR coefficients {cj} decay exponentially [which 
includes, but is not limited to, the ARMA(p, q) model with q> as a special 
case] and dn satisfies log5~^ = o(logn), APE^^ is asymptotically efficient in 
the sense of (2.3). However, if the {aj} decay algebraically. Example 3 points 
out that APE^^ is no longer asymptotically efficient if 6n is bounded away 
from 1. To alleviate this difficulty. Theorem 2 (also in Section 3) allows (5„ 
to converge at a certain rate to 1 and offers a theoretical justification for the 
proposed modification. In light of this result, a class of APE^^ criteria that 
can achieve asymptotic efficiency in both exponential and algebraic-decay 
cases is given; see Examples 4 and 5 after Theorem 2. 

The second issue concerns the performance of the information criterion 
and its relation to APE^^ from the same-realization prediction point of view. 
The value of the information criterion for model AR(A;) is defined by 

(1.8) iCp^^k)=logalik) + ^, 
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where > 1 is a positive number (possibly) depending on n, 

n— 1 

(1.9) alik) = ^J2 + ^n{k)Mk)f, 

t=K„ 

and N = n- Kn- Note that the AIC [1], BIG [23] and HQ criteria [7] cor- 
respond to ICp^ with Pn = 2,logn, and clog2ra, respectively, where c > 2 
and log2 ?^ = log(logn). Equation (1.8) is referred to as an AlC-like crite- 
rion if Pn is independent of n, and as a BIC-like criterion if P„ ^ oo and 
Pn = o[n). With the help of Proposition 2, Theorem 3 (see Section 4) gives 
an asymptotic expression for qn{kn,p„)-, where 

(1.10) A:„,p„ =argminICp„(A;). 

l<k<Kn 

This result extends Corollary 1 of [14], which only focuses on the MSPE 
of the AlC-like criteria. An interesting implication of Theorem 3 is that 
the HQ criterion is asymptotically efficient in the exponential-decay case 
whereas BIC is not; see Examples 6 and 7 in Section 4. While both HQ and 
BIC are known to be consistent in finite-order AR models [7] , these examples 
show that their prediction performance can differ remarkably in the AR(oo) 
case. Based on Theorems 1-3, an asymptotic equivalence between ICp„ and 
APE^^, with 5n and P„ satisfying (4.8), is given at the end of Section 4; see 
(4.9). 

The third issue in which we are interested is a long-standing unresolved 
problem concerning time series model selection. Under the assumption that 
(1.1) does not degenerate to an AR model of finite order, Ing and Wei [14] 
recently showed that AIC, satisfying (2.3), is asymptotically efficient for 
same-realization predictions. However, if the order of the underlying AR 
model is finite, then, as mentioned previously, the BIC-like criteria (e.g., 
HQ and BIC) are consistent, but AIC, which asymptotically will choose 
an overparameterized model with positive probability, does not possess this 
property [24] . When the APE^^ criteria are used instead, the choice between 
(5„ — > and (5„ ^ 1 also leads to the same difficulty; see Remark 5 in Section 
3. To tackle these dilemmas, in Section 5 we first concentrate on an impor- 
tant special case where {oj} either decay exponentially or are zero for all but 
a finite number of i. It is shown in Theorem 5 that ICp,j(A;), with P„ ^ oo 
and Pn = o(logn) and APE5„(A;), with 5'^ —>■ oo and log(5~^ = o(log?i), can 
simultaneously achieve asymptotic efficiency over these two types of AR pro- 
cesses. However, if the case where {aj} decay algebraically is also included, 
then the criteria proposed by Theorem 5 fail to preserve the same optimality. 
A two-stage procedure, (5.1), which is a hybrid between AIC and a BIC-like 
criterion, is provided as a remedy. Its validity is justified theoretically in 
Theorem 6 (also in Section 5). Note that the results mentioned above are 
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verified under the assumption that all moments of Cf are finite [see (K.3) 
in Section 2]. It is made to obtain a uniform moment bound for R~^{k) 
[see (B.6) in Appendix B], based on the recent work of Ing and Wei ([11], 
Theorem 2; [14], Proposition 1). In fact, (K.3) can be slightly relaxed at the 
cost of reducing the number of candidate models. However, the details are 
not pursued here in order to simplify the discussion. Simulation results illus- 
trating finite sample performance of the aforementioned criteria are given in 
Section 6. For ease of reading, the proofs of the results in Sections 2-5 are 
deferred to Appendices A-D, respectively. 

2. Preliminary results. We first list a set of assumptions that are used 
throughout the paper. 

(K.l) Let {xt} be a linear process satisfying (1.1) with A{z) = 1 + aiz + 
02-2^ + • • • 7^ for \z\ < 1. Furthermore, let the coefficients {aj} obey 

(K.2) Let the distribution function of et be denoted by Ft- There are 
two arbitrarily small positive numbers, a and 5q, and one arbitrarily large 
positive number, Co, such that for all t = ... , —1,0, 1, . . . and |x — y| < Sq, 

\Ft{x)-Ft{y)\<Co\x-y\''. 

(K.3) sup_oo<t<oo-E^|et|'' < oo, s=l,2,.... 
(K.4) The maximal order Kn satisfies 

Ci < < Cu, 

n 

where 61, Ci and Cu are some prescribed positive numbers. 
(K.5) a„ ^ for infinitely many n. 

First note that the MSPE of Xn+i{k), qn{k) [see (1.7)], can be expressed 

as 

(2.1) a"^ + E{i{k) + S{k)f , 

where 

-y n—l k 

^ik)=^nik)Rn^{k)— Xj(A;)ej+i,fc, ej+i^k = xj+i + Yai{k)xj+i-i, 

j=K„ 1=1 

{ai{k),...,ak{k))' = a{k)= argmin Eixk+i + YciXk+i-i] 

(ci,.,Cfc)'eiJ'= \ 1=1 J 



and 



•^(k) = Yiai - ai{k))xn+i- 



1=1 
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with ai{k) = for i > fc. To simplify the notation, a(/c) is sometimes viewed 
as an infinite-dimensional vector with undefined entries set to zero. Ing and 
Wei ([11], Theorem 3) obtained an asymptotic expression for qn{k) — cr^, 
which holds uniformly for all 1 < A: < K^- This result is summarized in the 
following proposition. 



Proposition 1. Assume that (K.1)-(K.4) hold. Then 
(2.2) hm max '^''^^ '''^ -1=0, 

n^ool<k<Kn Ln[k) 

where 



krr'^ 

Ln{k) = — + \\ei-^{k)\\l, 

and for any infinite- dimensional vector d = (di, ^2, . . .)', ||d||^ = J2i<i j<oo '^i'^j ^ 
7j_j, with ^i^j = E{xiXj). Also note that ||a — a(A;)|||j = E{S'^{k)) . 

The first term of Ln{k), ka'^ /N, which is proportional to k, can be viewed 
as a measure of model complexity. The second term of Ln{k), ||a — a(A;)||^, 
which decreases as k increases, measures the goodness of fit. If one attempts 
to find an order k whose corresponding predictor, Xn+i{k)^ has the minimal 
MSPE, then some data-driven order selection criteria are needed. An or- 
der selection criterion, /c^, is said to be asymptotically efficient if Xn+i{kn) 
satisfies 

(2.3) hmsup — — 7T < 1, 

n^oo mmi<fc<i^,^ qn{k) - 

where 1 < kn < Kn- Inequality (2.3) says that the (second-order) MSPE of 
the predictor with order determined by an asymptotically efficient criterion 
is ultimately not greater than that of the best predictor among {xn+i(l), . . . , 
Xn+i{Kn)} . In view of (2.2), (2.3) is equivalent to 

(2.4) hmsup— ———<1, 

where Ln{kl) = mini<A..<x„ Ln{k). 

Let OS„(A;) be an order selection function and 

(2.5) kn,os = argminOS„(A:) 

l<k<Kn 

be the selected order. We shall provide sufficient conditions under which 
Qn{kn^os) — can be asymptotically expressed in terms of the Ln{-) func- 
tion. Define 

(2.6) Lr.,Djk) = + ||a - a(fc)|||, 
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where Dn > 1, and 

(2.7) k^jj^=avgmmLn,Dn{k)- 

l<k<K„ 

Proposition 2. Assume that (K.1)-(K.4) hold. If there exists a se- 
quence of positive numbers {Dn}, with lim inf „^oo > 1, such that 

E(S(knos)-<S(k* „ ))2 

(2.8) lim {Dn - 1) ^ ^ r \u* < = ^ 

and 

(2.9) hm [Dn - 1) --^ ^ = 0, 

where S{k) and i{k) are defined after (2.1), then 

(2.10) lim ^"f "f 7' = 1. 
Moreover, if 

/r, 1 1 ^ T ^nik'^j) ) 

(2.11) lim ' " = 1, 

then (2.3) [(2.4)] holds for kn = ^n,os- 

Remark 1. If (K.l), (K.5), sviY)_oo<t<ooE\^t\'^ < and K„ = o(n^/^) 
are assumed, and (2.8) and (2.9) are replaced with 

(2.12) p-lim(L>„ - 1) — ^ = 0, 

then it is shown in Appendix A that 

, r ^{{yn+l - yn+l{kn,Os)f\xi, . . . ,Xn] - a'^ _ 

p-nm T (h* \ ~ ' 

where yn+i is the future value of which is a realization from 

an independent copy of {xt}, and yn+iik) = — y^(A;)an(fe) with y^(A;) = 
(y„, . . . , y„+i_fc). Note that (2.13) gives an asymptotic expression for the 
(conditional) MSPE of A;n,os hi independent-realization settings. For fur- 
ther discussion, see Remark 6 in Section 6. 

Proposition 2 asserts that if k^^os is sufficiently close to /c* in the 
sense of (2.8) and (2.9), then qn{kn,os) — has the asymptotic expression 
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Ln{k^ D„)- addition, if (2.11) also holds, then /c„^os is asymptotically effi- 
cient. Proposition 2 plays a prominent role in justifying APE^^'s and ICp^'s 
asymptotic (in) efficiency in various situations; see Sections 3-5. To apply 
Proposition 2, it is important to determine a penalty term Z)„ associated 
with the selection criterion OS„(fc) and then justify (2.8) and (2.9) under 
suitable assumptions. For the ffi'st task, it is shown in Section 3 that the Dn 
associated with AFEs^{k) is 

According to Appendix C, the Dn associated with ICp,j(fc) and 

(2.15) ^^^=(l + ^)^n(^) 

is Pn- To facilitate the second task, inspired by Ing and Wei ([14], (3.9)), 
assumption (K.6) (see below) is frequently used in the rest of this paper. As 
will be seen in Appendices B-D, it is introduced to deal with the complicated 
dependency conditions among the selected order, estimated parameters and 
future observations. (Note that in the independent-realization settings, the 
future value to be predicted is independent of the selected order and esti- 
mated parameters.) 

(K.6) For any ^ > 0, there are a nonnegative exponent <9 = 9{^) < 1 
and a positive number M = M(^) such that 

(2.16) liminfi?„(C,6l,Af) >0, 

where, with obeying (K.4), D„ satisfying liminf Z)„ > 1 and Dn = o{n), 
and Ad^^om = {k:l <k < KnAk - fc^^^J > M(/c* .^J^}, 

Rn{t0,M)= mm (/c ^^J^ -r — . 

k&Ao^.e.M " [Dn - l)\k - k^ jjj 

If {xt} is an AR process of finite order [viz., (K.5) does not hold], then 

(2.16) automatically holds. On the other hand, if (K.5) holds instead, by 
arguments similar to those in Examples 1 and 2 and the Appendix of Ing 
and Wei [14], it can also be shown that (2.16) is satisfied in the following 
cases: (a) the exponential-decay case, 

(2.17) Cik-^'e-^'' <\\a-a{k)\\%<C2k^'e-^'', 

where C2 > Ci > 0,^i > and /? > [note that if (K.l) is assumed, then 

(2.17) is equivalent to C^fc-^^e-^'^ < Ei>fc«? < C^k^^e"'^'', for some Q > 
Ci > 0]; and (b) the algebraic-decay case, 

(2.18) {C3 - Mik-^')k-^ < ||a - a(A;)||| < (C3 + Mik-^')k-^ , 
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where C3, Mi > 0, > 2 and /3 > 1 + 61 [recall that 5| is defined in (K.4)]. 
These facts reveal that (2.16) is quite reasonable from both practical and 
theoretical points of view, since it includes the ARMA model (which is the 
most used short-memory time series model by far) and the AR(cx)) model 
with algebraically decaying coefficients (which is of much theoretical interest 
in the context of model selection) as special cases. 

It is worth mentioning that when (K.1)-(K.6) are assumed, (2.8) and 

(2.9) were verified by Ing and Wei ([14], (5.75) and (5.74), resp.) in the 

(2) 

special case where /cn.os = argmini<fc<j^^ Sn {k) and D„ = 2. Using similar 
arguments and assumptions, it can be shown that (2.8) and (2.9) are still 
valid for kn,os = kn,p„ [defined in (1.10)], Dn = Pn and 1 < P„ = a < cxo 
independent of n. However, to justify (2.8) and (2.9) in the case /cn.os = kn,&„ 
and Dn = -Dape^-^ or in the case fen, OS = fen,P„ and Dn = Pn ^ 00, a much 
more delicate analysis is required. This problem is tackled in the next two 
sections. Note that in the finite-order AR models, (2.8) and (2.9) can also 
be verified for these two types of criteria under suitable assumptions; see 
Section 5 for more details. 

As observed in Proposition 2, (2.11) is an important key to the asymptotic 
efficiency. It holds if the penalty term Dn satisfies 

(2.19) lim Dn = 2. 



n— ►oo 



To see this, first observe that 

(2.20) max ^"'^"/f ^ - 1 

l<k<Kn Ln{k) 

if Dn 2. The result (2.20) and the fact that 



(2.21) 



^ ^ ^njkn^pj _ Ln [kn^D^ ) / Ln,Dn {kn,D„ ) Ln,Dn (^n.gn ) 
Ln{k*n) Ln{k*)/Ln,D„{k*) Ln,Dn{k*n) 

Ln{k*nnJ/LnMK,Dj 



< 



Ln{k*)/Ln,D„{k* 



yield (2.11). When ||a — a(fc)||^ decays exponentially or (K.5) is violated, 
(2.11) can hold without (2.19); see Example 1 in Section 3 and the proof of 
Theorem 4 in Appendix D. For some other interesting discussion regarding 
(2.11), see Examples 2 and 3 in Section 3 and Examples 6-8 in Section 4. 

3. The MSPE of APE^^ in AR(oo) processes. This section provides 
asymptotic expressions for qnikn,s„ ) — f^"^ ■ Without loss of generality, n5n, 1/n < 
(5„ < 1 — (l/n), is assumed to be a positive integer. First note that 

AFEs„ik)= E {x^+l + ^i{k)kiik)f 

i=nSn 



10 C.-K. ING 

(3.1) 



n-l 



i=n5„ 

where Cj^fc = X'(A;)(aj(A:) — a(A;)) and Ci+i^k is defined after (2.1). Following 
Lai and Wei ([19], (2.7)), 

n— 1 71—1 71—1 

^?,fc= ^i(^)e?+i,fc+Qn5„(A;)-g„(A;)+ ^ hi{k)elk 

i=n6n i=n5n i=nSn 



(3.2) 



where 



n-l 

i=n5n 



/i,(A:)=x^(A:)('^ x,(fc)x;.(fc)) x,(fc) 



and 



On substituting (3.2) into (3.1), one obtains 

n-l ( n~\ \ 

APE5„(/c)= 4i+ E /i^(A:)e2+i,fc-/ca2log5-H 

i=nSn \i=n5n ) 



(3.3) 



n— 1 71—1 
i=nSn i=nS„ 



n-l 



E - ei+i)^ - l|a - a(fc)|||} 



i=nSn 



71-1 

+ 2 V (cj+i fc - ei+i)ei+i +n(l - 5„)L„ z) (A;), 

j = 71(5n 

where -Dape^ is defined in (2.14). When 5n is bounded away from 1, Theo- 
rem 1 below provides sufficient conditions under which (2.8) and (2.9) hold 
for fcn^os = kn.&n and D„ = Dape^^ • As a result, an asymptotic expression 
for qnikn,5„) — o"^ is obtained. Note that the relation Dn = DapEs„ is used 
in the rest of this section. 
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Theorem 1. Assume that (K.1)-(K.6) hold and 1/n < 6n <l — (I/'t-) 
satisfies 

(3.4) limsup^n < 1 

n— >oo 

and 

(3.5) 0<limmfn''3^„<oo, 

where < 6*3 < 5*/(2 + 5j|"). Moreover, if for some 0<6 = 9{C) < 1 and r]>0, 

(3.6) lim ^- — , , = 0, 

where 6 is obtained from (K.6) when 

(3.7) < e < min{l/2, {(2 + 6l){l - e^)/2} - 1}, 
then (2.8) and (2.9) are true for kn^os = kn^Sn- Hence, 

Remark 2. Note that 9 in (3.6) is not uniquely determined. In order for 
(3.6) to be less stringent, 9 can be chosen as small as possible; see Examples 
1 and 3 below for more details. 



Remark 3. Consider the following assumption: 
(K.6') For any ^ > 0, there is a subsequence of {n}, {ni}, such that 
(3.9) liminfi?„,(^,0,l) > 0. 

>oo 

[Recall that Rn{^,9,M) is defined in (K.6).] If, in place of (K.6), (K.6') 
is assumed in Theorem 1, then it can be shown that (3.8) remains valid 
for n = Hi, without imposing (3.6). This finding is applied in Example 2 
below to illustrate that the APE^^ criteria, with 6n decreasing to at a 
polynomial rate, perform poorly in the case where the AR coefficients decay 
exponentially fast. 

Remark 4. Under (K.5), it is not difficult to see that /c* — > cx3 as 
n — > 00. Therefore, when < (5„ = 5 < 1 is fixed with n, (3.6) automatically 
holds. 



The following examples help gain a better understanding of Theorem 1. 
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Example 1. Assume that (K.1)-(K.4) hold and the AR coefficients sat- 
isfy 

(3.10) Cie-^^< ||a-a(fc)|||<C2e-^^ 

where < Ci < C2 < oo and /3 > 0. Note that (3.10) is satisfied by any causal 
and invertible ARMA(p, g) model with q> 0. As mentioned in Section 2, 
when (K.l) is assumed, (3.10) can also be expressed as C[e~^^ < J2i>k^i — 
C'2e~^^ for some < C( < C2 < 00. We shall show in this example that (3.8) 
follows if 5n satisfies (3.4) and 

(3.11) log5-^ = o(logn). 

Condition (3.11) guarantees (3.5). It can be shown to be equivalent to 
n'^^n — 00 for all u>Q. Obviously, (3.4) and (3.11) are satisfied if < 5^ = 
5 < 1 is independent of n or if = (lognY^ for some ui > 0. Therefore, in 
view of Theorem 1 and the discussion given after (K.6), it remains to verify 
(3.6). By (3.10) and an argument similar to that used in the Appendix of 
[14], for some Ci > 0, 

log n - i log(L'„ - 1) - Ci < /c; 

(3.12) 

11 

<-logn--log(L'„-l) + (:7i, 

and for any > 0, (2.16) holds for ^ = and some M > (or for any < ^ < 1 
and any M > 0). As a result, (3.6) holds for 9 = and t] > 0, and hence (3.8) 
follows. Moreover, (3.10) and the same argument used to prove (A.l) of [14] 
yield that for some C2 > 0, 

(3.13) -^logn-C2<fc:<-^logn + C2. 

According to (3.10)^(3.13), we obtain (2.11), which together with (3.8) im- 
plies that APE^^, with (5„ satisfying (3.4) and (3.11), is asymptotically effi- 
cient. 



Example 2. This example is given to indicate that if decays to at 
a polynomial rate, then APE^^^ cannot be asymptotically efficient even in 
the exponential-decay case. More specifically, assume that (K.1)-(K.4) are 
satisfied, 

(3.14) Sn = Cin-'^ 

where Ci> and < ^3 < (5|/(2 + (5|), and the AR coefficients obey a special 
case of (3.10), 

(3.15) ||a-a(fc)||| = C2e-^^(l + Gfc), 
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where |Gfc| < 1, Gfc ^ as A; ^ oo and C2 and /? are some positive numbers. 
By (3.15) and an argument similar to that used in Case II of [25], page 162, 
for sufficiently large n, 

(3.16) A;* = or mi,„ + 1, 

where mi^n is the largest integer < mj ,^ = {l/f3)log[C2N/3/{{Dn — l)o"^)]. 
Define xi^n = "'tt-i n ~ "^i,n and z = (1/(3) log[/3/(l — e~^)]. Since < z < 1, 
there is a positive number k such that 0<z — k<z + k< 1. Define a set of 
positive integers Ak = {n: \xi^n — z\ > K,n = 1,2, . . .}. Then, it can be shown 
that contains infinitely many elements. Moreover, for any ^ > and any 
sequence of positive integers {ni} C A^, (3.9) holds. Therefore, according to 
Remark 3, (3.8) is valid for n = ni. By analogy with (3.16), for sufficiently 
large n, 

(3.17) A;* = m2,„ or m2,„ + 1, 

where m2,n is the largest integer < (1//3) log(C2iV/?/cr^). (3.14)-(3.17) yield 

(3.18) hm > 1, 
which, together with (3.8) (with n = rii) gives 

(3.19) hmsup — — >1. 

As a result, APE^^, with 5^ satisfying (3.14), fails to achieve (2.4) in the 
exponential-decay case. 

Example 3. This example investigates the prediction performance of 
APE^^ in the algebraic-decay case (2.18). If (2.18), (3.4) and (3.5) are sat- 
isfied, then the same argument as the one in Example 2 of [14] yields that 

(3.20) K^o^ = {NC^f3{Dn - 1)- V-2)i/(^+i) + 0(1), 

and for any > 0, (2.16) holds for any 1 — min{^, 1} < < 1 and any M > 0. 
These facts and (3.5) guarantee that (3.6) is valid for 1 — min{^, 1} < ^ < 1 
and Q <ri < {1 — 9)/6, where ^ satisfies (3.7). Consequently, when (K.l)- 
(K.4), (2.18), (3.4) and (3.5) are assumed, (3.8) is ensured by Theorem 1. 
By (A.9) of [14], 

(3.21) kl = (iVC3/3cT-2)V(/3+i) + Oil). 
This, (2.18), (3.4), (3.5) and (3.20) imply that 

(3.22) liininfM|:^>l. 

According to (3.8) and (3.22), the APE^^ is not asymptotically efficient in 
this case. 
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As can be seen from Example 3, due to violation of (2.11), APE^,^, with (5„ 
bounded away from 1, is not asymptotically efficient in the algebraic-decay 
case. In view of (2.19)-(2.21), a natural remedy for this difficulty is to let 
5n — > 1- However, problems can still occur if 6n converges "too fast" to 1. To 
see this, let (5„ = 1 — Then APE5^(A;) = (x„ — £n(/c))^. Since models 

are determined only by the last period's prediction errors, it does not seem 
possible to establish any (asymptotically) optimal selection result in this 
case. To resolve this dilemma, some suitable choices of 5n are introduced in 
Theorem 2. Some examples are also given after the theorem to help gain 
further insight into it. 

Theorem 2. Assume that (K.1)-(K.6) hold and 1/n < 5„ < 1 - (1/n) 
satisfies lim„_>oo <^n = 1 • Moreover, if either of the following conditions holds, 
then (2.8) and (2.9) are valid for k^^os = kn^s„- 

(i) lim^^oo K,Djri^' = for any 63 >0 and (1 - 5„)-i = 0{kl%J for 
some < ^2 < 1/2.' 

(ii) (1 - 5„)-i = 0{k*%J for some < ^2 < min{l/2, 61/2}. 

Consequently, APEg^ is asymptotically efficient in the sense of (2.3) [(2.4)]. 

In light of Theorem 2, the following examples demonstrate how to choose 
Sn such that the resulting APE^^ is asymptotically efficient in both the 
exponential- and algebraic-decay cases. 

Example 4. Assume that (K.1)-(K.4) hold and the AR coefficients 
obey (2.17). Although Example 1 shows that when 61 in (2.17) is equal to 0, 
APE^^, with 5n satisfying (3.4) and (3.11), is asymptotically efficient, it is 
unclear whether this result still holds for > 0. Fortunately, this difficulty 
can be bypassed by letting 

(3.23) 5„ = l-Cl(logn)-^ 

with Ci > and < r < 1/2. First note that under (2.17) and (3.23), the 
same argument as in Example 1 of [14] yields that for some C2 > 0, 

(3.24) ^ log n - C2 log2 n < k^ jj^^ < ^ log n + C2 log2 n, 

and for any > 0, (2.16) holds for any < ^ < 1 and any M > 0. Moreover, 
since condition (i) of Theorem 2 is ensured by (3.23) and (3.24), APE^^, 
with 6n satisfying (3.23), is asymptotically efficient under (2.17). 

Example 5. This example shows that if (5„ satisfies (3.23) with Ci > 
and < r < 00, then the corresponding APE^^ is asymptotically efficient 
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under the algebraic-decay case (2.18). To see this, first note that following 
the same line of reasoning as in Example 2 of [14], (3.20) is still valid, and 
for any > 0, (2.16) holds for any 1 — min{(^, 1} < ^ < 1 and any M > 0. 
In addition, since condition (ii) of Theorem 2 is ensured by (3.20) and the 
condition imposed on 6fi^ the desired result follows from Theorem 2. 



Examples 4 and 5 suggest that to achieve asymptotic efficiency through 
APE^^ in both the exponential- and algebraic-decay cases, 6n can be chosen 
to satisfy (3.23) with Ci > and < r < 1/2. However, the question of how 
to determine the best Ci and r seems difficult to answer from a finite sample 
point of view. For some simulation results illustrating APE^^ 's performance 
in finite samples, see Section 5. We close this section with two remarks 
concerning the performance of APE^^ in finite-order AR models and for 
independent-realization predictions. 



Remark 5. When (1.1) degenerates to an AR(po) model with l<po< 
oo, it can be shown that kn^s„, with liminf„^oo <Jn > 0, is not a consistent 
estimator of po (e.g., [15]). On the other hand, if (5„ — > at a certain rate, 
then the corresponding APE^^ is consistent and asymptotically efficient (see 
Appendix D). Since these results and Theorem 2 offer totally different sug- 
gestions for choosing Sn, it becomes very challenging to achieve asymptotic 
efficiency through APE^^ when (1.1) is allowed to degenerate to a finite au- 
toregression. In Section 5, some selection criteria to remedy this difficulty 
are proposed. 



Remark 6. Note that the APE^^ described in Theorem 2 is also asymp- 
totically efficient for independent-realization predictions. By Corollary B.l 
(see Appendix B), 

/o r,r;^ T -^n,Dn{kn,5n) p, 

(3.25) p-hm . r -1 = 0. 

Armed with (3.25) and (2.19)-(2.21), it can be shown that (2.12) holds for 
kn,os = kn,5„- Consequently, Remark 1 and (2.11) guarantee that 

(3.26) p-hm — — = 1, 



which gives the claimed result. For more details on the definition of asymp- 
totic efficiency in independent-realization settings, see [2, 16] and [25]. 
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4. The MSPE of ICp^ in AR(oo) processes. In this section, prediction 
performance of the information criterion IC p^{k),Pn > 1, is investigated. 
When Pn is independent of n, Ing and Wei ([14], Corollary 1) obtained an 

asymptotic expression for qn(kn,Pn) ~ ^"^ ^ where kn,p„, defined in (1.10), is the 
minimizer of ICp„(A;), with \ <k < and Kn satisfying (K.4). Theorem 
3 below extends Ing and Wei's result to the case where Pn is allowed to 
tend to oo with n. Note that the relation Dn = Pn is used throughout this 
section. 

Theorem 3. Let (K.1)-(K.6) hold and Pn satisfy 

(4.1) liminfP„>l 

and 

(4.2) p„ = 0(n^^), 

for some < 6I3 < (1 + 5\)/{4: + 251). Moreover, if (3.6) holds with (3.7) 
replaced by 

(4.3) < e < min{l/2, 6*J2, {(1/2) - 03}(2 + SI), (1 + 61)- 2ds{2 + 61)}, 
then (2.8) and (2.9) are true for kn^os = kn.p„- Consequently, 

Remark 7. If in Theorem 3, (K.6') holds instead of (K.6), then it can 
be shown that (4.4) is still valid for n = ni [note that n; is defined in (K.6')]. 
In this case, condition (3.6) is not required. This result can be applied to 
verify that BIC is not asymptotically efficient in the exponential-decay case; 
see Example 7 for more details. 

Remark 8. Since (K.5) is assumed, (3.6) holds automatically if Pn = 
0(1). 

The following examples illustrate implications of Theorem 3. Special em- 
phasis is placed on comparing the predictive capabilities of three well-known 
information criteria, AIC, HQ and BIC, in various situations. 

Example 6. Assume that (K.1)-(K.4) hold and the AR coefficients sat- 
isfy (3.10). We shall show that ICp„(/c), with P„ satisfying (4.1) and 

(4.5) P„ = o(logn), 

is asymptotically efficient. Therefore, the AIC and HQ criteria are asymp- 
totically efficient in this case. To see this, first note that the same reasoning 
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as in Example 1 yields (3.12) and that for any ^ > 0, (2.16) holds for ^ = 
and some M > (or for any < 9 < 1 and any M > 0). These results and 

(4.5) imply that (3.6) holds for 6* = and 77 > 0. According to Theorem 3, 
(4.4) follows. Consequently, the claimed result is ensured by (4.4) and by 
observing that (2.11) is valid under (3.10), (3.12), (3.13) and (4.5). 

Example 7. This example illustrates that an information criterion can- 
not be asymptotically efficient in the exponential-decay case when the weight 
for penalizing the number of regressors in the model is "too strong." To see 
this, let (K.1)-(K.4) and (3.15) be satisfied and 

(4.6) Pn = Ci{lognf^, 

where Ci > and C2 > 1- Under these assumptions, (3.16) is obtained and 
(3.9) holds for any ^ > and any sequence {ni} C A,^, where is defined 
as in Example 2. By Remark 7, (4.4) is valid for n = ni. Moreover, since 
(3.15)-(3.17) and (4.6) yield (3.18), it is concluded that ICp„(A;), with P„ 
given by (4.6), is not asymptotically efficient. One important implication of 
this example is that BIC is not asymptotically efficient in the algebraic-decay 
case. 



Example 8. Consider the algebraic-decay case, (2.18). Let P„ satisfy 
(4.1) and 

(4.7) P„ = 0((logn)^i), 

for some Ci > 0. By an argument similar to that used in Example 3, one ob- 
tains (3.20), and for any ^ > 0, (2.16) holds for any 1 — min{^, 1} < ^ < 1 and 
any M > 0. These facts and (4.7) imply that (3.6) is valid for 1 — min{^, 1} < 
9 <l and Q <r] <{l-0)/6, where i satisfies (4.3). As a result, (4.4) follows 
from Theorem 3. Moreover, by (2.19)-(2.21), (2.11) holds when lim„^oo Pn = 
2; and by (2.18), (3.20) and (3.21), limsup„^ooL„(/c; £,^)/L„(/c;) > 1 when 
lim^^oo -fn 7^ 2. This observation and (4.4) imply that AIC and AICc [9] 
are asymptotically efficient in the algebraic-decay case, (2.18), whereas HQ, 
BIC and any information criterion with lim„^oo-fn 7^ 2 are not. 



As a final remark, note that when the conditions imposed by Theorems 
1 and 3 (or Theorems 2 and 3) hold and 

(4.8) lim ^^^^ r = 1, 

then 

(49) lim E{Xn+l-Xn+l{kn,P^)f-<T^ _^ 
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Instead of attempting to achieve a certain asymptotic optimality for pre- 
diction, (4.8) and (4.9) are interesting in that (4.8) can be used to connect 
the sequence 5n defining APE^^ with a corresponding parameter estima- 
tion penalty weight sequence, P„ = 1 + (1 — 6n)~^ log5~^, in such a way that 
Xn+i{kn,p„) and Xn+i{kn,5„) have the same asymptotic same-reahzation pre- 
diction (in) efficiency, as observed in (4.9). And, conversely, a sequence P„ 
implicitly determines a sequence 5n through this same relation, which yields 
identical asymptotic (in) efficiency for ICp„ and APE^^. This connection not 
only imparts to the APE^^^ criteria the deep foundations of the information 
criteria, but also endows the information criteria with an on-line predic- 
tion meaning. For a related result, Wei ([29], Theorem 4.2.2), under (1.1) 
and certain moment conditions on ct (which can be verified for the normal 
distribution), established an algebraic connection between BIC and APE, 
log(APE(A;)/n) = BIC(A;) + o(logn/n) a.s., where A; is a positive integer and 
fixed with n. Therefore, except for the o(logn/n) term, the logarithm of 
APE(A;)/n is (a.s.) identical to BlC(fc). Hannan, McDougall and Poskitt [6] 
also obtained the same result in a stationary AR(po) model with po < oo 
and k > Po (the correctly specified case). However, the equivalence intro- 
duced by (4.9) seems to be more relevant in situations where the predictive 
capabilities of the two criteria after order selection are emphasized. 

5. Optimal prediction for possibly degenerate AR(oo) processes. This 
section deals with optimal prediction problems in situations where the un- 
derlying AR(oo) process can degenerate to an AR process of finite order. 
We first adopt (K.5') to replace the truly infinite-order assumption, (K.5). 

(K.5'): The AR coefficients satisfy either 

(i) ttpQ 7^ for some unknown 1 < po < co and ai = for alH > po + 1 or 

(ii) (3.10). 

From a practical point of view, (K.5') is reasonably flexible because it 
contains any causal and invertible ARMA(p, g) model, with p + q > 1, as a 
special case. Before tackling order selection problems under (K.5'), a pre- 
liminary result is needed, which shows that APE^^ and ICp,^, with 6n and 
Pn satisfying certain conditions, are asymptotically efficient in finite-order 
cases. 

Theorem 4. Assume that (K.1)-(K.4) and (i) of (K.5') hold. Then 
(2.8) and (2.9) hold for {kn,os, Dn) = {kn,5„, DapEsJ and {kn,p„,Pn), where 
6n satisfies oo and (3.5), and Pn satisfies Pn oo and Pn = 0{n^) 

for some < s < 1. In addition, (2.3) [(2.4)] is satisfied by these criteria. 

Remark 9. Since Theorem 4 adopts {AR(1), . . . , AR(K„)} as the set of 
candidate models, where Kn ^ oo at a certain rate, the true model AR(po) 
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is included asymptotically. Zheng and Loh [31] also took this approach. 
However, unlike Theorem 4, their main concern is with the consistency in 
order selection. 

When (ii) of (K.5') holds, Example 6 points out that ICp„, with Pn = 
o(logn), possesses asymptotic efficiency. On the other hand, if (i) of (K.5') 
is true, then Theorem 4 shows that ICp„, with — > oo and Pn = 0{n^), 
< s < 1, is asymptotically efficient under (K.1)-(K.4). These results taken 
together suggest that ICp„, with P„ — > cxd and P„ = o(logn), simultaneously 
achieve (2.3) over the two types of AR processes defined in (i) and (ii) of 
(K.5'). According to Example 1 and Theorem 4, APE^^, with ^ oo and 
logd~^ = o(logn), also has this property. This discussion is now summarized 
in the following theorem. 

Theorem 5. Assume that (K.1)-(K.4) and (K.5') hold. Then (2.3) 
[(2.4)] holds for A;„ = kn,5„ and kn,p„ , where 6n satisfies — > oo and log = 
o(logn), and Pn satisfies Pn ^ oo and Pn = o(logn). 

As pointed out in Examples 3 and 8, the criteria given by Theorem 5 fail to 
preserve asymptotic efficiency when (2.18) is included in (K.5'). To overcome 
this difficulty, we propose using an alternative criterion that chooses order 
£.W. 

nn ■ 

(5.1) feW = K2l{k^,,^^k„.^p^,} + ^«'^"-^{fc„.P„=fc„.,P„J' 

where < t < 1, P„ ^ oo, A;„i.,p^, = arg mini<fc</<^, ICp^, (k) and 

lCp^^{k)=logal{k) + ^, 
with alik) = (1/iVj (x,+i + a'„.(/c)x,(A:))2, N, = n'- 

kn^ik) = -R~Um/N.) E Mk)^j+u 

and R^^n^{k) = (l/N,^) X]j=x i (note that without loss of general- 

ity, n' and Kn^ are assumed to be positive integers). As observed, (5.1) is a 
hybrid selection procedure that combines together AIC and a BIC-like cri- 
terion. If the true order is finite, then it is expected that the orders selected 
by the BIC-like criterion at stages and n will ultimately be the same 
due to consistency. On the other hand, when the true order is infinite, an 
interesting result is derived for which it is nearly impossible for the BIC-like 
criterion to choose the same order at these different stages; see Appendix D. 
Therefore, it is reasonable to adopt kn^2 (the order selected by AIC) if ICp„ 
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and ICp^i determine different orders, and A:n,p„ (the order selected by the 
BIC-like criterion) otherwise. Theorem 6 justifies the validity of kn^ . 

Theorem 6. Let (K.1)-(K.4) and (K.6) (with Dn = 2) hold, and t and 
Pn in (5.1) satisfy Q < i <l, P„ ^ oo, P„ = 0{n'-^), with < ti < (1 + 
Si)/{2 + 51), and Pn/Pn^ = 0{1) for some i' > 0. Further, assume that the 
AR coefficients meet either of the following conditions: 

(i) (i) of (K.5'); 

(ii) for any ^ > 0, there are a nonnegative exponent, < 9 = ^(^) < 1, 
and a positive number, M = M(^) > 0, such that 

(5.2) limmf mm [k„ p f ; — r — ■ > 0, 

and for all sufficiently large n, 

where Ap^^^M is defined in (K.6), denotes the empty set, 
^Pn,e,M = {k-l<k<Kn,ki Ap„,e,M} 

and 

^p^.fiM = {k:l<k<Kn^,ki ^P,,.,e,A/}. 
[Note that (5.3) implicitly implies that ai ^0 for infinitely many I.] 

Then (2.3) [(2.4)] holds for kn = k^n ■ 

As an application of Theorem 6, it is shown in Example 9 below that kn\ 

< i < 1, is asymptotically efficient when the true model is either (i) an AR 
process of finite order, (ii) an AR(oo) process with coefficients satisfying 
(3.10) (the exponential-decay case) or (iii) an AR(oo) process with coeffi- 
cients satisfying (2.18) (the algebraic-decay case). To simplify the discussion, 
let Pn be given by (4.6) with Ci,C2 > 0, which satisfies all requirements for 
Pn imposed by Theorem 6. 

Example 9. Assume that (K.1)-(K.4) hold, and either (K.5') or (2.18) 

is satisfied. To show that kn\ < < 1, is asymptotically efficient in this 
situation, in view of Theorem 6 and the discussion after (K.6), it suffices 
to show that (5.2) and (5.3) are guaranteed by (3.10) as well as (2.18). 
First, assume that (3.10) is true. Then Example 6 shows that (3.12), with 
Dn = Pni is valid, which further implies that for any ^ > 0, (5.2) holds for any 

1 — min{^, 1} < ^? < 1 and any M > 0. In addition, (5.3) follows from (3.12) 
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(with Dn = Pn), (4.6) (with Ci, C2 > 0), < i < 1, and < 6" < 1. Next, let 
(2.18) hold. Reasoning as for Example 8, we obtain (3.20) (with Dn = Pn) 
and that for any ^ > 0, (5.2) holds for any 1 — (min{^,2}/2) <9 < \ and 
any M > 0. Moreover, (5.3) is guaranteed by (3.20) (with = Pn), (4.6) 
(with Ci, C2 > 0), < i < 1, and <9 <1. Consequently, the desired result 
follows. 

6. Simulation results. To illustrate the practical implications of our the- 
oretical results, a simulation study is conducted in this section. Let obser- 
vations be generated from the ARMA(1, 1) model 

xt+i = (t>oxt + et+1 + OoEt, 

where the et's are independent and identically AA(0, 1) distributed and ((/)o, ^0) = 
(0.0,0.98), (0.5, 0.8), (0.5, 0.4), and (0.9, 0). For each combination of (0o,6'o), 
the empirical estimates of 

E{xn+i - rc„+i(A;„,))2 - 1 

denoted by RE(kn) are obtained based on 5000 replications for n = 180, 
300, 500, and 1000, where A;„ = A;„,<5„ , A;n,p„ , kn\ with (5„ = (logn)~-'^,l — 
(2/3)(logn)-0-i, 1 - (2/3)(logn)-0-K l _ (2/3)(log n)-0■l^ P^ = 2.001 logan 
(HQ), logn (BIG), and l = 0.69, 0.72, 0.75. The penalty term of the BIC-like 

criterion associated with kn '^ is given by (4.6) with Ci = 0.8 and C2 = 1. In 
addition, Kn and Kn^ are set to the largest integers less than or equal to n^/^ 
and Ji''/^, respectively. Obviously, RE{kn) measures the relative prediction 
efhciency of x„+i(A:„) to Xn+i{kn,2) (AIC), and RE{kn) > 1 [RE{kn) < 1] 
suggests that kn performs better (worse) than kn,2- These empirical results 
(see Table 1) are summarized as follows: 

(1) AIC and BIG. The relative efficiencies of AIG and BIG are clearly 
affected by the magnitude of the MA parameter in finite-sample situations. 
Table 1 shows that when ^ 0.8, AIG notably outperforms BIG, which 
coincides with our theoretical findings in Examples 7 and 8 that BIG is 
not asymptotically efficient in truly AR(cx)) models. In contrast, values of 
,iogn) larger than 1 when 9o — 0.4. However, since these values 
rapidly decrease from 1.26 to 1.08 as n grows from 180 to 1000, the theo- 
retical result just mentioned does not seem to be seriously violated. On the 
other hand, when 6q = 0, values of RE{kn^\ogn) are larger than 3.5, and do 
not exhibit any decreasing trend. This matches the fact that BIG is consis- 
tent and asymptotically efficient in finite-order AR models (see Section 5), 
whereas AIG is not. 
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Table 1 
Empirical estimates of RE{kn) 



n 


A/f 

ivioaeis 
(</'o,0o) 




APE5„ 










Two- stage 












HQ 


BIC 


^0.69 






180 


(0.0, 0.98) 


0.88 


0.93 


0.92 


0.93 


0.89 


0.78 


0.95 


0.94 


0.94 




(0.5, 0.8) 


0.95 


0.95 


0.95 


0.94 


0.98 


0.83 


0.98 


0.97 


0.97 




(0.5, 0.4) 


1.28 


1.07 


1.05 


1.03 


1.36 


1.26 


1.08 


1.08 


1.08 




(0.9, 0.0) 


2.21 


1.34 


1.33 


1.28 


2.31 


3.59 


1.81 


1.86 


1.95 


300 


(0.0, 0.98) 


0.88 


0.94 


0.94 


0.94 


0.89 


0.74 


0.97 


0.96 


0.95 




(0.5, 0.8) 


0.98 


0.99 


0.98 


0.97 


0.96 


0.79 


0.95 


0.94 


0.93 




(0.5, 0.4) 


1.28 


1.03 


1.03 


1.03 


1.24 


1.24 


1.09 


1.09 


1.09 




(0.9, 0.0) 


2.18 


1.37 


1.32 


1.26 


2.44 


3.46 


1.95 


1.99 


2.07 


500 


(0.0, 0.98) 


0.85 


0.94 


0.95 


0.95 


0.85 


0.68 


0.96 


0.95 


0.94 




(0.5, 0.8) 


0.97 


0.97 


0.97 


0.96 


0.98 


0.78 


0.97 


0.95 


0.95 




(0.5, 0.4) 


1.28 


1.10 


1.05 


1.04 


1.32 


1.17 


1.03 


1.02 


1.06 




(0.9, 0.0) 


2.31 


1.36 


1.31 


1.27 


2.64 


4.17 


2.39 


2.43 


2.41 


1000 


(0.0, 0.98) 


0.86 


0.95 


0.96 


0.95 


0.86 


0.66 


0.99 


0.98 


0.98 




(0.5, 0.8) 


1.05 


0.97 


0.96 


0.96 


1.01 


0.80 


0.97 


0.97 


0.95 




(0.5, 0.4) 


1.36 


1.12 


1.09 


1.04 


1.37 


1.08 


1.00 


1.00 


0.98 




(0.9, 0.0) 


2.33 


1.27 


1.26 


1.21 


2.86 


4.07 


2.65 


2.74 


2.67 


Note: S 


i,n = (logn) 


,52,n 


= 1 - 


(2/3) (log 


;n)-«'i 


, 53,n = 1 


-(2/3) (log n) 


and 54,n = 



l-(2/3)(logn)-«'il 

(2) HQ and APE^^ ^, where 6i^n = (logn)~^. First note that the predic- 
tion efficiencies of these two criteria seem quite close. They perform com- 
parably to AIC when = 0.8, and much better than it when 9o < 0.4. 
This phenomenon can be explained by the fact that HQ and APE^^ ^ are 
asymptotically efficient in both the finite-order AR model and the AR(oo) 
model with AR coefficients decaying exponentially (see Theorem 5). Their 
efficiencies, however, are smaller than AIC in the case = 0.98. Since it is 
difficult to distinguish between an MA(1) process with a very large MA co- 
efficient and an AR(oo) process with AR coefficients decaying algebraically 
in finite samples, Examples 3 and 8 (which show that HQ and APE^^ are 
not asymptotically efficient in the algebraic-decay case) may explain why 
HQ and APE^^ „ perform worse than AIC when is very close to unity. In 
addition, we also observe that these two criteria are not as efficient as BIC 
in the case = 0, but they beat BIC in all other cases. 

(3) APE5^_^ , i = 2, 3, 4, where 52,n = 1 - (2/3) (log n) "O- ^ , Jg,^ = 1 - (2/3) x 
(logn)-°-^2 and (54,n = 1 - (2/3)(logn)-o il Table 1 shows that APE^^^,^ = 
2,3,4, holds a slight advantage (disadvantage) over AIC when = 0.4 
(^0 ^ 0.8). However, since the amount of the advantage (disadvantage) is 
not sizable, these Monte Carlo results seem to support the theoretical find- 
ings revealed in Examples 4-6 and 8 that AIC and these APE^,^ criteria 
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are asymptotically equivalent in both the exponential- and algebraic-decay 
cases. On the other hand, these criteria tend to outperform AIC when 6q = 0. 
But they are still much less efficient than all other criteria in this case due 
to the lack of consistency in the finite-order case (see Remark 5). 

(4) Two-stage criteria. One special feature of two-stage criteria is that 
they behave like AIC in situations where AIC dominates, and improve sub- 
stantially over AIC in situations where AIC performs poorly. More specifi- 
cally, values of RE{kn^) are rather close to 1 when ^ 0.8, and significantly 
larger than 1 when 6q = 0. In the case = 0.4, the two-stage criteria per- 
form slightly better than AIC when n < 300, and comparably to AIC when 
n > 500. These simulation results seem to match quite well with the conclu- 
sion drawn from Example 9 that the two-stage criteria are asymptotically 
efficient in all three (finite-order, exponential-decay, and algebraic-decay) 
cases. When = 0, the prediction performance of the two-stage criteria is 
similar to that of HQ and APE^^ ^ (particularly when n is large) , but worse 
than that of BIC. 

In conclusion, note that the finite-sample behavior of the criteria consid- 
ered in this section can be well predicted by the asymptotic results obtained 
in Sections 3-5. Some desirable features (when compared to AIC or BIC) 
of the APE^. ^ , HQ, and two-stage criteria are particularly encouraging (see 
the discussions above). The tuning parameters adopted in this section may 
also serve as good initial values for pursuing better finite-sample efficiencies. 

APPENDIX A: PROOFS OF PROPOSITION 2 AND (2.13) 



Proof of proposition 2. In view of (1.7) and (2.1), 

gn(fcn,Os) - 0"^ 
(A.l) = E{i{kn,Os) - {{K,dJ + S{kn,Os) 

It is also not difficult to see that 

(A.2) %^^^^^ = 0(D„-1). 

By (A.l), (A.2), (2.2), (2.8) and (2.9), (2.10) follows. Moreover, if (2.11) is 
assumed, (2.10) can be rewritten as 

(A.3) lim ^-(^"'Q;)"^' = 1, 
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and hence (2.3) [or (2.4)] holds for kn = kn.os- ^ 

Proof OF (2.13). Under (K.l), (K.5) 
o(n^/^), an argument given in [14], page 2448 yields 

(A 4) p lim ^'^^^"+^ - yn+i{kn,os)f\xi, . . . ,Xn} - a"^ 



The desired result now follows from (2.12) and (A. 4). □ 

APPENDIX B: PROOFS OF THEOREMS 1 AND 2 

In the rest of this paper, C is used to denote a generic positive constant 
independent of the sample size n and of any index with an upper (or lower) 
limit dependent on n. It also may have different values in different places. 
We start with a modification of Lemma 6 of [14] . 

Lemma B.l. Assume that (K.l) holds with X^i^i l^^^^^il < oo replaced 
by X^i^i Wi\ < oo and sup-oo<t<ooE\et\'^'^ < oo for some q>2. Let {rrii^n}, 
i= 0, 1, 2, be sequences of positive integers satisfying m2^n > > mo,n for 
all n>\. Then, for all n>\ and all 1 <k,j < mo,n; 

^l^m,„,™,„(fc)-^i-('5^,„,„^,„(j)-^')r<Cm-«/'l|a(i)-a(A;)||?j, 

(B.l) 

where mn = m2,„ - mi,„ + 1, 5'^i_„,m2,„(^) = (iMn) Ej!Lmi,„ 4+i,k' ^U) 
and a{k) in (B.l) are viewed as infinite- dimensional vectors with unde- 
fined entries set to zero and a'^ = E{ef ^). Also note that ||a(j) — a(/c)|||j = 
|||a-a(j)|||-||a-a(A:)||2j|. 

The proof of (B.l) is similar to that of [14], Lemma 6, and hence is 
omitted. Let n6n, with l/ra < (5„ < 1 — (l/n), be a positive integer. According 
to (3.3), for ky^k*n, 



P{kn,s^ = k)<P{ 

(B.2) 



^APE^JA:) ^ APE^J/c;,,^ 



UsAk) - Us^k) 
<Y.p{mk)\>^VnMk)], 



1=1 ^ 12 

where Dn = -Dape^,^ is used throughout this appendix, Us„ (k) = n{l — 6n)Ln,D„ 

Vn,DAk) = {Ln,DAk) " Ln.D^ikloJ) K^dJ^) , 

n-1 



\N,ik)\ = Ui^}ik) 



J2 hi{k)ef^i^k\ - ka"^ log 6 J 

\i=n5n ' 



\N2ik) 

\N,ik) 
\Ne{k) 

\Nrik) 
\Ns{k) 
\Ngik) 
\Nioik) 
\Nu{k) 
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J2 ^iikn,D„)e. 



i+l,k* 



kn D- 5„ ^ 



-nSn 



Us'^\k)\Q^sAk)-ka'\, 
U^^{k)\Qn{k)-ka\ 

^sJ^ ik)\Qn{kn^Dn) ~ kn,Dn,'^'^\i 



Ul'ik) 

Us:ik) 
'^Ul\k) 
^^Ui^\k) 

u.Hk) 



n-l 

hi{k)e. 

i=n5n 
n-l 



^i(k*i,Dj^l,kl^Ur, 

n-l 

Y ^iik)ei,kei+i,k , 

i=nSn 
n-l 



=n6n 



n-l 



{^i+l{k) — £i+l{kn,D„) 
i=n5„ 

- \\^-^{k)\\%+\\a-a{k;^^^J\\%} 



\N,2{k)\ = Ui^\k) 



n-l 



Y (ei+i,fc - ei+i,fc*^^^)ei+i 

i=nSn 



and ei+i{k) = e^+i^fc - e^+i. 

By (B.2), Chebyshev's inequality, and moment bounds for \Ni\,i = 1, . . . , 12 
(see Lemmas B.2-B.4 below), an upper bound for P{kn^Sn — k) can be ob- 
tained. This upper bound plays an important role in verifying Theorems 1 
and 2. 



Lemma B.2. Let the assumptions of Proposition 1 hold and 1/n <5n< 
1 — (1/n) satisfy (3.5). Then, for g > 0, all 1 <k < Kn and all sufficiently 
large n, 

(B.3) E{\Nr{k)\'^)<CU^;^{k)i^-^^^^+{\og5~^^^^^ 
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and 



(B.4) 



Proof. We only prove (B.3) because the proof of (B.4) is similar. Define 

D{i,k) = x'i{k)R-^{k)xi{k){i + 1 - Kn)-^ and E{i,k) = k{i + 1 - Kn)'^ . 
Then 



UsM\Ni{k)\< 



(B.5) 



n-l 

^ {K{k) - D{i,k))ei^^^^ 

i=n5„ 



+ 



n-l 



+ 



n-l 

5: D(i,fc)(e2+,-cT2 



n-l 



E{i,k)-klog6-' 

i=nSn 



+ J2 {D{i,k)-E{i,k))a^ 

i=n5n 

= l{k) + + + IV(A;) + V(A;). 

By (3.5) and [14], Proposition 1, we have, for any q> 0, all n6n <i<n—l, 
all 1 < /c < Kn and all sufficiently large n, 

(B.6) ■ " " " 



E\\Rr_}^{k)-R~\kW<C- 



Using [28], Lemma 2, and Jensen's inequality, it follows that for any r > 0, 
all n5n < i < n — 1 and all 1 < A: < Kn , 

(B.7) Ei\\Mk)\n<Ck'/^ 

and 

(B.8) E\ei+i,k\' <C. 

According to (B.6)--(B.8), Minkowski's inequality and Holder's inequality 
we have, for q>l, all 1 < A; < Kn and all sufficiently large n, 

(B.9) Emy<( J2 ||(/i.(fc)-I)(^,fc))ef+i,,|l V<Cfc25(nd„)-'?/2, 

\i=nS„ / 

where for a random variable z and positive number s, \\z\\s = -E(|z|'')"^/*. 

To deal with II(A:), note that the first moment bound theorem of Findley 
and Wei [4] and Jensen's inequality yield for any r > 0, all Kn < i < n — 1 
and all 1 < A; < Kn , 



(B.IO) 



E{\ji.[{k)R-\k)Mk) - kn < C¥''^. 
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Reasoning as for (B.8), we have, for any r > 0, all Kn < i < n — 1 and all 
l<k<Kn, 



(B.ll) 



Ei\e.+,{k)n<C\\^-^{kW^. 



(B.IO), (B.ll), [28], Lemma 2, and an argument similar to that used for 
obtaining (B.9) together imply that for q>2 and all 1 <k < Kn, 



E{U{k)y<C^E 
(B.12) 



n— 1 



J2 D{i,k)el,{k) 



+ E 



n-l 



D{i,k)e,+i{k) 



i=nSn 



< C{{\og5-yk^s. - a(A:)|||^ + [nSnY'^'^k^s.- s.{k)\\%}. 



Similarly, 



g/2 



(B.13) ^(III(A;))'^ < Yl ^^(^.^) < Cfc^Mn)""/^ 



holds for q>2 and all 1 < A; < K^- 

To deal with IV(A;), it can be shown by some algebraic manipulations that 



IV(A;) = a' 



n5n - Kn 



N 



+ E 



Ti-i{k) 



{i-Kn){i + l-Kn) 



where Ti{k) = X]j=_ft'„ ^iik)R~^{k)xi{k) — k. By an argument similar to that 
given in the proof of Lemma 3 of [14] and Jensen's inequality, one has for 
any q> 0, all n5n — 1 <i <n — 1 and all 1 < A; < Kn, 



E 



T^{k) 



i + l-Kn 



<C- 



^3g/2 



(i + 1 - Kn)l/^ ■ 

This and the Minkowski inequality yield that for q>l and all 1 < /c < Kn, 

(B.14) E{W{k)y < Ck^'i/^{n5n)-'^'^. 

In addition, it is straightforward to show that for all 1 < A; < Kn, 



(B.15) 



\{k) < C 



1-Sn 



Kr, 



n 



Consequently, (B.3) follows from (B.5), (B.9), (B.12)-(B.15), Jensen's in- 
equality, and the fact that for any r > 0, 



(B.16) 

which is ensured by (K.l). □ 



lim rila-a(A:)l||j'' = 0, 

fc^oo 
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Lemma B.3. Under the assumptions of Lemma B.2, for q > 0, all 1 < 
k < Kn and all sufficiently large n, 



(B.17) 




(B.18) 


E{\N,{m < cu^:{k){K:^DMn)-'" + K%} 


(B.19) 


EilN^ik)]") < CU^^{k)(e^n~'i''^ + 


(B.20) 


E{\N,{k)\'^) < CUi^^{k){k:Xn~^/' + k:'^'l), 


(B.21) 


E{\N,{k)\'')<CUl\k)k^'i{n5n)-'^, 


(B.22) 


E{\N,{kW) < CUi^\k)k:%Jn5nr', 


(B.23) 


E{\NQ{k)\'') < CU^^{k)e'i{n5n)-''''^ and 


(B.24) 


i?(iiVio(fc)r) < cui\k)kt:r>Mnr'" ■ 



Proof. See [12], Lemmas A. 7 and A. 8. □ 

Lemma B.4. Let the assumptions of Lemma B.l hold and 1/n < (5„ < 
1 — {1/n). Then, for q>2 and all 1 <k < Kn, with Kn < n5n, 



(B.25) i5([A^,(^)|«) <CC/,7/^(fc)L;^^^(A:)||a(A:) -a(^;,J||?, 



where z = 11 and 12. 



Proof. First note that 



E\LnMk)Nii{k)V 



(B.26) 




77,(1 - 5n) 




Lemma 2 of [28] and the convexity of x'^^'^ ,x > 0, yield for all 1 < k < K, 



(11) < 



{n(l-5„)}(9/2)+i . 
C||a(fc)-a(fc;^^J| 

(1 - (5n)9/2n''/2 



^(|ei+i,fc - ei+i,fc. 1^) 



(B.28) 



< 
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Consequently, (B.25), with i = 11, is ensured by (B.26)-(B.28). The proof is 
completed by noting that (B.25), with i = 12, is an immediate consequence 
of (B.28). □ 

Armed with Lemmas B.2~B.4, we have the following result. 

Corollary B.l. Let (K.1)-(K.5), (3.4) and (3.5) hold. Then, for any 
r>0, 



(B.29) lim i?f:^!ii^4^!ii^-l 



0. 



Proof. Define Ii,n{k) = Ln,D„{k)/Ln,D„{kn dJ- Let e > be arbitrar- 
ily given. Then, by (B.2), one has 

(B.30) =Y,{h,^{k)-lYP{K,s„=k) 

k=l 

<^' + fl\ E (/i,n(fc)-l)'^n|iVz(A:)|>(l/12)K,z>„(fe))|, 

where A^^^ll^ = {k:l<k< Kn,h,n{k) - 1 > e}. In view of (B.30), (B.29) 
holds if for / = 1, ... , 12, 

(B.31) lim V (/i,„(A;)-l)'^P(|iV,(A;)|>(l/12)K,D„(A;))=0. 

In the following, we only prove (B.31) for / = 1,3 and 11 because the proofs 
for I = 2, 7, 8, 9 and 10 are similar to that for I = 1, proofs for Z = 4, 5 and 6 
are similar to that for / = 3, and the proof for I = 12 is similar to that for 
I = 11. 

By (B.3), Chebyshev's inequality, (3.4), (3.5) and the facts that 
(B.32) Ln,DM > l|a-a(A:)||2j, nL„,BjA;) > C-^^°^'^" ' 



l-<5„ 



and /i,„(A;) < C/Lr,DAK,Dj I < k < k*^^^ and h,nik) < Ck/k^j,^ if 
kn Dn ^ k < Kn, we have, for sufficiently large q, 

Y: ihAk) - iYpm{k)\ > (i/i2)K,D„(fc)) 



<c y: A^,n(A^)K.ir^(^){feva+(/2,nfc)'^n-n 
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(B.33) 



<C\ 



•1 + e 



V e 



q—r 



rk* 



E 



L k= 



k'}{l-6n 



+ E 

k=k* „ +1 



fink'' 
'^n,DnJl,n n,D„ 



+ 



where /i,„ = (log(5„ /2,„ = log(5.„ V(l-<^n) and /3,n= log5„ V(^'^n)- 

Therefore, (B.31) holds for / = 1. 

By (B.17), (B.25), an argument similar to that used for obtaining (B.33) 
and the fact that /c* ^ oo as n ^ oo, for sufficiently large q, 

Y: ihAk) - lYPm{k)\ > {l/12)Vn,DAk)) 



(B.34) 



<C 



fc=i 

+ E k'^/'Ui^'{k)Iln{k)]=o{l) 



k=k'' „ +1 



and 



Y: ihAk) - lYPi\Nuik)\ > (l/12)K,B„(fc)) 



keA 



(■5n) 



(B.35) < C 



1 + e 



9/2/ 



fc=l 



:0(1). 



In view of (B.33)-(B.35), the proof is complete. □ 

Corollary B.2. Assume that (K.1)-(K.6) hold and 5n satisfies (3.4) 
and (3.5). Then, for sufficiently large q, 



E 



(B.36) 



{Ln,DAkn,5jy/' 



2q 



o((fc;,,,j-(^-')''+')+o((iog5-i)-'^) 
+ o((iog5-i)-^/2(fc;^j(-^/2)+^), 



where S{k) is defined in Section 2 and < 6 = 9{^) < 1 is any exponent 
obtained from (K.6) with ^ satisfying (3.7). 
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Proof. Let ^ satisfy (3.7). Then (K.6) guarantees that there are < 6* = 
6'(^) < 1 and M = M(0 > such that (2.16) is satisfied. Let {9,M) be any 
such pair. Define l2,n{k) = Ln^£)^{k) — in,D„(^'j^ Holder's inequahty 

and the fact that for any /i > 0, 

E\S{k)-S{K^j,J\"^ < C||a(fc) -a(fe;^J||f 

(B.37) 

< C{l2,n{k) + f2,nN-'\k - K.^^Ja^f 

(which follows from [28], Lemma 2, (K.3) and the definition of k'^^^J, one 
has for g > and 1 < r < oo , 



E 



S{kn,sJ-S{k*o^] 



L 

<i:(^ 



2q 



5(fe)-5(A:;^J 2</r-xl/r 



k=l 



P^'-^^/\kn,5^=k) 



(B.38) 



k=l 



P^'-^^'\kn,6^=k) 



< C E yn,Djk)P^''~'^/^k,S„ =k)+ 



k = l 



NL 



n,D, 



Xk) 



NLnMk) 

P^''-^^l'{K,5^ = k) 



^C{(I) + (!!) + (Ill)}, 

where A^^^q^m is a set of positive integers defined in (K.6). 

By the definitions of ^D„,e,A/, Ln,D„{k) and Ln,Dn{kn,Dn)i *° 
see that 

(B.39) 



(ii)<c(^^;^j"(i-^)'^+^ 



In view of (B.2) and the fact that for a,5 > 0, (a + 5)('^~i)/'^ < a^'^-^)/^ + 
^('■-i)/'^'^ one obtains 

(B.40) {i)<j:\Y.ylDSk)p'^'-'^'^m{k)\ > (i/i2)K,D„(fe))|. 

1=1 U=i J 

In the following, we shall show that when q is sufficiently large, 

(B.41) Y.yn,Djk)P^'-'^/^m{k)\ > il/12)Vn,DAk))=o{{log5~Y'), 

k=l 
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for / = 1, . . . , 10; and 

k=i 

(B.42) 

= 0((log5-i)-^/2(fc;^j("^/2)+^)+o((log5-i)-'^), 
for / = 11 and 12. As a result, one has for sufficiently large q, 
(B.43) (l) = 0((log<5-i)-''/2(fc;^j(-«/2)+^) + o((log5-^)-'^). 

By Lemma B.2, (3.4), (3.5), (K.4) and (B.32), for sufficiently large q, 

T.KDjk)P^'-'^/H\N,{k)\ > (1/12)K,D„(A:)) 



k=l 

l(r-l)/r 

k=l 

C ki (log5-i)29A;9^ 



(B.44) <Cj2[E{\Ni {k)\i^'/^^'-^'^ }] ( 



< 



which yields (B.41) for I = 1. For / = 3, according to (3.5), Lemma B.3, 
(B.32) and the fact that k^ ^ oo as n ^ oo, one has for sufficiently large 

Y.yiDSk)p^'~'^/'m{k)\ > {i/i2)Vn,DAk)) 



k=l 



< 



C5][i?{|iV3(A;)|W(-i)}]{ 



r— l)/r 



k=l 

(B.45) 

= o((log5-i)-''). 

The proofs of (B.41) for / = 2,7,8,9 and 10 are similar to that of (B.44) 
and the proofs of (B.41) for / = 4, 5 and 6 are similar to that of (B.45). We 
skip the details in order to save space. The proof of (B.42) is a bit more 
complicated. By (2.16), Lemma B.4, (B.37), (3.4) and the restriction on ^, 
one has for sufficiently large q, 

T.KDjk)P^^~'^/'^m{k)\ > {l/12)VnMk)) 
k=l 
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(B.46) 



gr/(r-l)x{r-l)/r 



fe=i 



+ E V-$Sk){E\Nii{k)\ 



2gr/(r-l)|(r-l)/' 



fe=i 

g ||a(A:)-a(fc;^J||^ 



+ E 



llnik) + \ihn{k-K,Dj)/N\' 



<c 



k-1 



n,D„ 



unmjk) 



+ 



E ^n,Dr, + E ^ 



k=l 



0((log5-i)-'^/2(A;;^j(-'?/2)+^) + o((log5„"i)-^), 



where Z = 11 or 12. 

Following arguments similar to those used to obtain (B.40) and (B.44)- 
(B.46), it can be shown that 



12 ( K„ 

(ni)<E E 

1=1 { k=i 



f2,n{k-kln) 



(B.47) 



(iVL„,^Jfc)) 
xp(-i)A'(|A^K^)l>(l/12)K,D„(fc)) 



ocaogj-i)-"), 



where the equality holds for sufficiently large q. (For a detailed proof of 
(B.47), see [12], Corollary A.2.) Consequently, (B.36) is ensured by (B.38), 
(B.39), (B.43) and (B.47). □ 
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Corollary B.3. Assume that the assumptions of Corollary B.2 hold. 
Then, for sufficiently large q, 



(B.48) 



lim E 

n— ►oo 



n,Dn 



2q 



Proof. Equation (B.48) can be verified using arguments similar to 
those in the proofs of [14], Lemmas 7 and 8, and Corollary B.2 above. For 
details, see [12], Corollary A. 3. □ 

We are now ready to prove Theorem 1. 



Proof of theorem 1. Let (r/, 9) be a pair satisfying (3.6), where rj > 
and <9 = 6{^) < 1 is obtained from (K.6) with ^ obeying (3.7). Then, by 
using Holder's inequality, Jensen's inequality. Corollaries B.l and B.2, (3.4) 
and (3.6), and taking q > max{?]^^, 1}, 



{SCkn,5j-^iK,Djy 



< {Dn - 1) 



\S{K,s„)-S{Kn^)\^'^^V'l 



E 



(B.49) 



E 



n,Dn\'^n,5n, 



Ln,DniK 



n,Dn 



o 



Dn-l 



+ 



Dr,- 



+ 



I 



(log<5ni)V2(fc;^ji/2-^'/. 



0(1). 



By Corollaries B.l and B.3 and an argument similar to that used to prove 
(B.49), 



(B.50) 



{Dn - l)E 



{f(fc„,^j-f(fc;^j}- 

I^n,D„{knn„) 



0(1). 



Consequently, the desired result is ensured by (B.49), (B.50) and Proposition 
2. □ 



Proof of theorem 2. First note that when liuin^ooSn = I and con- 
dition (i) [or (ii)] of Theorem 2 are assumed instead of (3.4) and (3.5), 
the left-hand sides of (B.33)-(B.35) still converge to 0. Therefore, (B.29) 
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follows. Let < ^ < (1/2) — ^2 if condition (i) of Theorem 2 holds, and 
< ^ < min{(l/2) - 4, {61/2) - ^2} if condition (h) of Theorem 2 holds. 
Then, by Jensen's inequality and the same reasoning used in the proofs of 
Corollaries B.2 and B.3, we have for any g > 0, 



E 



(B.51) 



E 



f(A;„,5j-f(A: 



2q 



2q 



■ 0(1) and 



0(1). 



Consequently, the claimed result follows from (B.29), (B.51), (2.11) and 
Proposition 2. □ 

APPENDIX C: PROOF OF THEOREM 3 

Instead of verifying Theorem 3 directly, we will first investigate the pre- 

ip •\ 

diction performance of 5; "'(/c), defined in (2.15). By analogy with (4.1) 
of [25], 

S';['-\k)=NLnMk)+Pnk{al{k)-a^) 
(C.l) +{ka^-N\\kn{k)-s.{k)\\\^^^^) 

+ Na'^ + N{Sl^^^_,{k)-al), 

where D„ = P„, for a k x k symmetric matrix A and a /c-dimensional vector 
Yi llylU — y'^Yi ^'^d the definition of Sj^^ „_i(A;) can be found in Lemma 
B.l. Note that the relation, D„ = P„, will be used throughout this appendix. 
Based on (C.l) and an argument similar to that used in (5.34) of [14], we 
have 



(C.2) 



P{klp^ = k)<J2P{\UUk)\ > {l/5)Vn,DAk)), 



1=1 



where K,D„(A:) is defined after (B.2), k^ = argmini<fc<A'„ 



NLnMk) 
NLnMk) 



\UiAk) 

\U2,n{k) 
\U3,n{k) 
\Ui,n{k) 

\U5,n{k) 



Pnk{al{k) - a^)\, 

Pnk:,nM(k*n,Dj-'^')\, 
ka^ -N\\an{k)-a{k)f 



k„ 



n,D„ 



N\\k, 



V/?nJllij„(fc; )l 



and 



S'K„,n-l{k) — ffc - -S*!- „_i(A;* n ) — cr|. 
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Theorem C.l. Let the assumptions of Theorem 3 hold. Then (2.8) and 
(2.9) hold for A^n.os = ^nPn' '^^^ (4-4) holds with kn,p„ replaced by k^p^- 

Proof. By Lemma B.l and analogies with [14], (5.43) and (5.47), we 
have for q > 0, all 1 <k < and all sufficiently large n, E\Ui^n{k)\'^ < 
C{P^k'iN-'i + N-'il'^), E\U2,n{k)\'' < C{P^k*^'ij^N-'' + N~''/^), E\U3,nik)\'' < 
C{k^m~i/^ + k'^/^)N-iL-^^Jk), E\U,,n{k)\^ < C{k*^'^^N~'i/^ + kl%)N~'^ X 

L-^jJA;) and E\U,,n{kT < C||a(A;) - a(A;;^J||?jAr"'^/2L-^^JA;). These mo- 
ment bounds and an argument similar to that used to verify Corollary B.l 
give for g > 0, 



(C.3) 



^n,D„(^f,pJ 



lim El 



With the help of (K.1)-(K.6), (4.1), (4.2) and the above moment properties, 
we can follow the ideas used in the proofs of Corollaries B.2 and B.3 to 
obtain that for q> 0, 



E 



(C.4) 



iLn,DM,pJV/' 



2q 



0(«n )-^'-'^^+')+o((^'n-l)-'^) 



+ 0{{Pn - 1) 



where <9 = 9{^) < 1 is any exponent obtained from (K.6) with ^ satisfying 
(4.3), and 



(C.5) 



E 



fiklp)-f{k*. 



{Ln,DM,pJY" 



2q 



^ 0{{Pn 



I) 



Consequently, the claimed result is guaranteed by (3.6) [with ^ satisfying 
(4.3)], (C.3)-(C.5) and Proposition 2. □ 



Proof of theorem 3. It suffices to show that (C.3)-(C.5) hold with 
k^P replaced by kn,p^- Define G„(A;) = A^exp{ICp„(A;)} — ^^^"^(A:) and 
\U&,n{k) \ = \Gn{k) — Gn{k^ Q )\/NLn,D„{k). Then, by the same reasoning as 
in (C.2), P{kn,p^=k)<Y!i=iP{\Uin{k)\ > (l/6)K,z)„ (fc)). Moreover, Tay- 
lor's theorem and [14], (5.42), yield that for ^ > 0, all 1 < A; < K^, and all 
sufficiently large n, E\U(i^n{k)\i < CP^'^Klm-^iL-^^Jk). These inequal- 
ities and the same argument used in the proof of Theorem C.l give the 
desired results. □ 
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APPENDIX D: PROOFS OF THEOREMS 4 AND 6 
Proof of theorem 4. First observe that for all sufficiently large re, 
(D.l) 



where i7„ = o(n) and i7„ > 1. Define h^nik) = {N{t{k) - i{po))^}/{poa^). 
In view of (D.l) and by Holder's inequality, we have, for all sufficiently large 
re, 



Po-1 

(D.2) < 5: {E\L,,n{k)n'^'^P^^-'y'-CK,s^ = k) 

k=l 

k=po+l 

where r > 1 and D„ = DAPEa • According to [14] , Proposition 1 and Lemmas 
1 and 2, (B.7) and (B.ll), 



(D.3) ^|/3,„(fc)r<| 



C, l<k<po-l, 
Ck', Po + l<k<Kn, 



as re is sufficiently large. Armed with (D.3), Lemmas B.2-B.4, the conditions 
imposed on (5„, and the fact that for 1 < < Kn and k ^po, Vnij^ik) < C, 
the proof of Corollary B.l is modified to obtain that for any s > 0, (I) = 
0{n~') and (II) = o((log(5-i)-^). Hence, (2.9) holds for kn,os = kn,5„ ■ Sim- 
ilarly, it can be shown that k^^Sn ^-Iso satisfies (2.8). In view of Proposition 
2 and (D.l), (2.3) [(2.4)] is achieved by kn,5,,. Modifying the proof of The- 
orem 3 and the above argument for APEs„{k), it can be shown that (2.8), 
(2.9) and (2.3) [(2.4)] can also be verified for kn^p„, with P„ satisfying the 
imposed constraints, P„ — > oo as re — > oo and P„ = 0{n^) for some < s < 1. 
The details are omitted in order to save space. □ 

Proof of theorem 6. Unhke the previous theorems, Proposition 2 
is not applied in the proof of Theorem 6 since the penalty term associated 
with k^\ 2/rf. „ /K, , ^ 1 + Pnht 1, is random. In the following. 
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we shall directly verify that 

(D.4) hmsup < 1. 

First assume that condition (ii) of Theorem 6 holds. Let < ^ < min{(5J'/2, 1/2, 
(1 + 51) - Li{2 + (5J)}. Then there are < 6^ = e{^) < 1 and M = M{C) > 
such that (5.2) is satisfied. Define 



^n,P,AK,Pn 



where M* is some positive constant, and h^nik) = {t{k) + S{k))'^ / Lnik^) ■ 
Then 

Ln{kn) 

= E{h,n{U^)} 

(D.5) <^{/4,n(fen,2)} 

^(i) + (n). 

Observe that for r > 1, 

= f: E{h^n{k)I^ }+ f: i<;{/4,„(/c)/^^. ^ 

fc=l " fc=l 



fceP, 



^C7{(III) + (IV)}, 

where the second inequality follows from Holder's inequality and the fact 
that for all 1 < < E\i{k) +S{k)\'^'' < CLl^{k), which is ensured by Wei 
([28], Lemma 2), Lig and Wei ([14], Proposition 1 and Lemmas 1 and 2) and 
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(B.16). To deal with (III), by (5.3) and the definition of Bn^M*, we have, for 
ah sufficiently large n, 



5„,M*n{i,2,...,K„.}cAp^„, 



Hence, (5.2) ensures that for all k G Bn,M* H {1, 2, . . . , K^i} and sufficiently 
large n, 

(fc) = {^nsP„.(A;)}{L„,,p„, (A;)-L„.,p„,(A;:.,p^J}-i < C(A:;.,p,J«. 

(D.7) 

The definition of i?n,M* also yields for all k G Bn^M* and P„ > 2, Ln(k) / Ln{k^) < 
{Pn - l){LnMk)/ 'Ln,pAkn,pJ} < CPn- According to this, (D.7) and the 
moment bounds for \Uinik)\,i = 1,...,6, we have, for g > and all suffi- 
ciently large n, 

^ piik'^+k*:.^pj ^ k^'^+ki^p^^ 



( J<n^ pi 

(iii)<cp„(fc;,p^j«'' — 

I k=l 



1 A:'?/2 + k*''l 
(D.8) + ^ + 



NUl^pJk) 
^ l|a(fc)-a(fc:,p^„)||^ ^ K'jplf 



N^^'Ll^pJk) Nf'Ll^pJk)}' 

By taking q on the right-hand side of (D.8) large enough and in view of the 
restrictions on ^ (given at the beginning of this proof) , l and Pn , 

(D.9) (ni) = o(i). 

Similarly, (5.2) and the definition of Bn^M* imply that for all sufficiently 
large n and k ^ {k:l <k < Kn and k ^ -Bn,M*}, (D-7) is still valid if and 
P„i are replaced by n and P„, respectively. This finding, the fact that for 
Pn ^ 2, Ln{k) / Ln{k^ )<{Pn- l){Ln,p„{k)/Ln,p„ikl pJ}, and an argument 
similar to the one used to verify (D.9) give (IV) = o(l), which, together with 
(D.5), (D.6), (D.9) and Theorem 2 of [14], yields (D.4). 

Next, assume that condition (i) holds. By the moment bounds for \Uin{k)\, 
i = 1, . . . , 6, and similar reasoning to that used in the proof of Theorem 4, 
we have 

(D.IO) hm PCkn,P„^Po) = 0, 

and for any g > 0, 

(D.ll) E\h,nCkn,2)\' = 0{l). 
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Since 



(D.12) "^"^^"^ 

(D.4) follows from (D.10)-(D.12), Holder's inequality and limsup^^;^ 
E{l4^n{kn,p„)) < 1 (which is ensured by Theorem 4). This completes the 
proof of the theorem. □ 
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